Categorization of Turkish News Documents with Morphological Analysis
نویسندگان
چکیده
Morphologically rich languages such as Turkish may benefit from morphological analysis in natural language tasks. In this study, we examine the effects of morphological analysis on text categorization task in Turkish. We use stems and word categories that are extracted with morphological analysis as main features and compare them with fixed length stemmers in a bag of words approach with several learning algorithms. We aim to show the effects of using varying degrees of morphological information.
منابع مشابه
Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملYandex School of Data Analysis approach to English-Turkish translation at WMT16 News Translation Task
We describe the English-Turkish and Turkish-English translation systems submitted by Yandex School of Data Analysis team to WMT16 news translation task. We successfully applied hand-crafted morphological (de-)segmentation of Turkish, syntax-based pre-ordering of English in English-Turkish and post-ordering of English in Turkish-English. We perform desegmentation using SMT and propose a simple y...
متن کاملPersonalized News Categorization Through Scalable Text Classification
Existing news portals on the WWW aim to provide users with numerous articles that are categorized into specific topics. Such a categorization procedure improves presentation of the information to the end-user. We further improve usability of these systems by presenting the architecture of a personalized news classification system that exploits user’s awareness of a topic in order to classify th...
متن کاملA New Approach for Semi-supervised Online News Classification
Due to the dramatic increasing of information on the Web, text categorization becomes a useful tool to organize the information. Traditional text categorization problem uses a training set from online sources with pre-defined class labels for text documents. Typically a large amount of online training news should be provided in order to learn a satisfactory categorization scheme. We investigate...
متن کاملSpotting scientific and technical specialization in biomedical documents using morphological clues
Distinction of the specialization level of the health documents on Internet is an important indication, especially when documents are read by non expert users such as patients. Indeed, a high technicity of documents impedes the patients to understand the content and may have a negative consequence on their health care process and on their communication with medical doctors. When medical portals...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013